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Abstract 

In this paper, we first present a new concept of 'weiglit' for 64 triplets and define a different weight for each kind 
of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA 
sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a 
newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of 
DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of 
P-globin genes of 1 1 species illustrate the utilities of our newly proposed method. 
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1. Introduction 

In the recent years, an exponential growth of sequence 
data in DNA databases has been observed by biologists; 
the importance of understanding genetic sequences 
coupled with the difficulty of working with such im- 
mense volumes of DNA sequence data underscores the 
urgent need for supportive visual tools. Recently, graph- 
ical representation is well regarded which can offer visual 
inspection of data and provide a simple way to facilitate 
the similarity analysis and comparison of DNA sequences 
[1-5]. Because of its convenience and excellent maneuver- 
ability, currently, all kinds of methods based on graphical 
representation have been extensively applied in relevant 
realms of bioinformatics. 

Until now, there are many different graphical repre- 
sentation methods having been proposed to numerically 
characterize DNA sequences on the basis of different 
multiple-dimension spaces. For example, Liao et al. [6-9], 
Randic et al. [10-13], Guo et al. [14,15], Qi et al. [16], Dai 
et al. [17,18], and Dorota et al. [19] proposed different 2D 
graphical representation methods of DNA sequences, re- 
spectively. Liao et al. [20-23], Randic et al. [24,25], Qi et al. 
[26], Yu et al. [27], and Aram et al. [28] proposed different 
3D graphical representation methods of DNA sequences, 
respectively. Liao et al. [29], Tang et al. [30], and Chi et al. 
[31] proposed different 4D graphical representation 
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methods of DNA sequences, respectively. In addition, 
Liao et al. [32] also proposed a kind of 5D representation 
method of DNA sequences and so on. 

In these approaches mentioned above, most of them 
adopt the leading eigenvalues of some matrices, such as 
L/L matrices, M/M matrices, E matrices, covariance 
matrices, and D/D matrices, to weigh the similarities/ 
dissimilarities among the complete coding sequences of 
|3-globin genes of different species. Because the matrix 
computation is needed to obtain the leading eigenvalues, 
these methods are usually computationally expensive for 
long DNA sequences. Furthermore, in some of these ap- 
proaches, their results of similarities/dissimilarities ana- 
lysis are not quite reasonable, and there are some results 
that do not accord with the fact [7,9] . 

To degrade the computational complexity and obtain 
more reasonable results of similarities/dissimilarities ana- 
lysis of DNA sequences, in this article, we propose a new 
2D graphical representation of DNA sequences based on 
triplets, in which, we present a new concept of weight' for 
64 triplets and a new concept of weight deviation' to 
weigh the similarities/dissimilarities among the complete 
coding sequences of |3-globin genes of different species. 
Compared with some existing graphical representations of 
the DNA sequences, our new scheme has the following 
advantages: (1) no matrix computation is needed, and 
(2) it can characterize the graphical representations for 
DNA sequences exactly and obtain reasonable results of 
similarities/ dissimilarities analysis of DNA sequences. 
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2. Proposed 2D graphical representation of DNA 
sequence 

Codon is a specific sequence of three adjacent nucleo- 
tides on the mRNA that specifies the genetic code informa- 
tion for synthesizing a particular amino acid. As illustrated 
in Table 1, there are total 20 amino acids and 64 codons in 
the natural world, and each of these codons has a specific 
meaning in protein synthesis: 64 codons represent amino 
acids and the other 3 codons cause the termination of 
protein synthesis. 

For the 64 codons illustrated in Table 1, their corre- 
sponding triplets of DNA are illustrated in Table 2. 

Based on the above 64 triplets of DNA illustrated in 
Table 2, we define a new mapping ^ to map each of 
these triplets into a different weight. Obviously, the 
mapping ^ shall satisfy the following rule: for any two 
pairs of triplets (Xi, Yi) and (X2, ^2), where Xi, Yi, X2, 
and Y2 are all triplets, if the corresponding codons of Xi 
and Yi code the same amino acid but the corresponding 
codons of X2 and Y2 code two different amino acids, 
then there shall be | ^ (X^) - ^ (Yi)! < | ^ {X2) - W (^2)1. 
So, according to the above rule and for the sake of con- 
venience, weights consist of amino acid and codon. 
Amino acid is the integer part of weight, and codon is 
the fractional part of weight. Alanine is defined as 1, 
arginine is defined as 2, and the rest can be done in the 
same manner. Codons of every amino acid are reor- 
dered, so the first codon of alanine's (GCT) weight value 
is 1.1. We design the detailed mapping rules of W as 
illustrated in Table 3. 



Table 1 Relationship between 20 different kinds of most 
common amino acids and 64 different kinds of mRNA 
codons 



Codons 


Amino 
acid 


Codons 


Amino acid 


GCU, GCC, GCA, 
GCG 


Alanine 


CUU, cue, CUA, 
CUG, UUA, UUG 


Leucine 


CGU, CGC, CGA, 
CGG, AGA, AGG 


Arginine 


AAA, AAG 


Lysine 


GAU, GAG 


Aspartic 
acid 


AUG 


Methionine 


AAU, AAC 


Asparagine 


UUU, UUG 


Phenylalanine 


UGU, UGC 


Cysteine 


ecu, CCC, CCA, 
CCG 


Proline 


GAA, GAG 


Glutamic 
acid 


UCU, UCC, UCA, 
UCG, AGU, AGC 


Serine 


CAA, GAG 


Glutamine 


ACU, ACC, ACA, ACG 


Threonine 


GGU, GGC, GGA, 
GGG 


Glycine 


UGG 


Tryptophan 


CAU, CAC 


Histidine 


UAU, UAG 


Tyrosine 


AUU, AUG, AUA 


Isoleucine 


GUU, GUC, GUA, 
GUG 


Valine 


UAA, UAG, UGA 









Table 2 The corresponding triplets of 64 codons 



Codons 


Corresponding 
triplets 


Codons 


Corresponding 
triplets 


GCU, GCC, GCA, 
GCG 


GCT, GCC, GCA, 
GCG 


CUU, cue, 

CUA, CUG, 
UUA, UUG 


C^, CTC, CTA, 
CTG, ™, ^G 


CGU, CGC, CGA, 
0020CGG, AGA, 
AGG 


CGT, CGC, CGA, 
CGG, AGA, AGG 


AAA, AAG 


AAA, AAG 


GAU, GAG 


GAT, GAG 


AUG 


ATG 


AAU, AAC 


AAT, AAC 


UUU, UUG 




UGU, UGC 


TGT, TGC 


ecu, CCC, 
CCA, CCG 


GCT, CCC, CCA, 
CCG 


GAA, GAG 


GAA, GAG 


UCU, UCC, 
UCA, UCG, 
AGU, AGC 


TCT, TCC, TCA, 
TCG, AGT, AGC 


CAA, GAG 


CAA, GAG 


ACU, ACC, 
ACA, ACG 


ACT, ACC, ACA, 
ACG 


GGU, GGC, GGA, 
GGG 


GGT, GGC, GGA, 
GGG 


UGG 


TGG 


CAU, CAC 


CAT, CAC 


UAU, UAG 


TAT, TAG 


AUU, AUG, AUA 


A^, ATC, ATA 


GUU, GUC, 
GUA, GUG 


G^, GTC, GTA, 
GTG 


UAA, UAG, UGA 


TAA, TAG, TGA 







For example, from Table 3, we will have W (GCT) = 
1.1, W (GCC) = 1.2, W (ATG) = 20.1, etc., and in addition, 
we can propose a novel 2D graphical representation of 
DNA sequences as follows: 

Let G = gi, g2y g3'"gN be an arbitrary DNA primary se- 
quence, where gi g {A, T, G, C} for any / g {1, 2,..., A/}, 
and then, we can transform G into a sequence of triplets 
such as G = ti, t2, h'-'tMy where Af=[M3] and ti is a 
triplet of DNA for any / g {1, 2,..., M}, Thereafter, we 
can define a new mapping 0 to map G into a plot set as 
illustrated in the formula (1). 

0(G) = {(l,^(^i)), (2,^(^2)), ... (M,^(^m))} (1) 

As for the complete coding sequences of p-globin 
genes of 11 species illustrated in the Table 4, each of 
them can be mapped into a plot set by using the new 
given mapping 0, and the 2D graphical representations 
corresponding to the complete coding sequences of |3- 
globin genes of human, chimpanzee, and opossum are 
shown in Figures 1, 2, and 3, respectively. 

3. Similarity analysis of DNA sequence 

Let G =^1, g2y gS'^'gN be an arbitrary complete coding se- 
quence, where gi g {A, T, G, C} for any / g {1, 2,..., A/}, 
and G = ti, t2, t^.^^tj^ be its corresponding sequence of 
triplets, where M = [N/3] and ti is a triplet of DNA for 
any / g {1, 2,..., M}. Then, we define a function S and let 
S {tj) represent the total number of times that the triplet 
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Table 3 The mapping rules of V 


Triplet 


Corresponding weight 


Triplet 


Corresponding weight 


GCT 


1.1 


C^ 


11.1 


GCC 


1.2 


CTC 


11.2 


GCA 


1.3 


CTA 


11.3 


GCG 


1.4 


CTG 


114 








11.5 






TTG 


11.6 


CGT 


2.1 


AAA 


12.3 


CGC 


2.2 


AAG 


124 


CGA 


2.3 






CGG 


2.4 






AGA 


2.5 






AGG 


2.6 






GAT 


3.3 


m 


13.1 


GAG 


34 


TTC 


13.2 


AAT 


4.1 


CGT 


14.1 


AAC 


4.2 


GCC 


14.2 






CCA 


14.3 






CGG 


144 


TGT 


5.1 


TGT 


15.1 


TGC 


5.2 


TGC 


15.2 






TGA 


15.3 






TCG 


154 






AGT 


15.5 






AGG 


15.6 


GAA 


6.1 


ACT 


16.3 


GAG 


6.2 


ACC 


164 






ACA 


16.5 






AGG 


16.6 


CAA 


7.1 


TGG 


17.3 


GAG 


7.2 






GGT 


8.1 


TAT 


18.1 


GGC 


8.2 


TAG 


18.2 


GGA 


8.3 






GGG 


84 






CAT 


9.1 


G^ 


19.1 


GAG 


9.2 


GTC 


19.2 






GTA 


19.3 






GTG 


194 


ATT 


10.1 


ATG 


20.1 


ATC 


10.2 






ATA 


10.3 






TAA 


21.1 






TAG 


21.2 






TGA 


21.3 







ti repeats in the sequence of triplets G = ti, t2, h,.,tM for 
any / g {1, 2,..., M}, 

Let ri = GCT, = GCC, = GCA, 74 = GCG, Ts = 
CGT, Te = CGC, = CGA, Tg = CGG, = AGA, T^o = 
AGG, Tn^GAT, = GAC, Tis = AAT, Ti^ = AAC, 
ri5 = TGT, Tie = TGC, = GAA, T^g = GAG, T,^ = 
CAA, r2o = CAG, 721 = GGT, 722 = GGC, 723 = GGA, 
724 = GGG, 725 = CAT, 726 = CAC, 727 = ATT, 728 = 
ATC, 729 = ATA, 73o = CTT 731 = CTC, 732 = CTA, 
733 = CTG, 734 = TTA, 735 = TTG, 735 = AAA, 737 = 
AAG, 738 = TTT, 739 = TTC, 740 = CCT, 741 = CCC, 
742 = CCA, 743 = CCG, 744 = TCT, 745 = TCC, 745 = TCA, 
747 = TCG, 748 = AGT, 749 = AGC, 750 = ACT, 751 = 
ACC, 752 = ACA, 753 = ACG, 754 = TGG, 755 = TAT, 
756 = TAC, 757 = GTT, 758 = GTC, 759 = GTA, T^o = GTG, 
761 = ATG, 762 = TAA, 763 = TAG, and 764 = TGA. 

Thereafter, according to Table 2, since there are a total 
of 64 triplets of DNA, then we can construct a set of 64 
vectors {<7i, S (7i)>, <72, S (72)>,..., <764, S (Te^h) for 
the given sequence of triplets G = ti, ti, h.,,tM as follows: 
if Ti = tj G {^1, ^2, ts..,tMl then S (j;) = S {tj), else 8 (T,) =0, 
for any / g {1, 2,..., 64} and ; g {1, 2,..., M}, 

For convenience, we call {<7i, 6 (7i)>, < 72, (72) >,..., 
<764, S (764)>} as the triplet- repeat model set of G, 

For any two given complete coding sequences A 
and B, suppose that their triplet-repeat model sets are 
{<7i, Xi>, <72, X2>,..., <764, X64>} and {<7i, Yi>, <72, 
y2>,..., <764, y64>}> respectively. Then, on the basis of the 
2D graphical representation given in the previous Section 
2, we can define the weight deviation between the two 
DNA sequences A and B as the following formula (2) to 
measure the similarity between A and B, 

64 

J2\X-Yi\*'F{Ti) 

WD{A, B) = (2) 

Obviously, the above formula (2) satisfies the fact that 
the smaller the weight deviation between the two DNA 
sequences A and B, the higher the degree of similarity of 
A and B, According to formula (2), the detailed similar- 
ity/dissimilarity matrix obtained for the coding se- 
quences listed in Table 4 is illustrated in Table 5. Basing 
on the similarity matrix (Table 5) constructs a phylogen- 
etic tree, which is shown in Figure 4. 

Observing Table 5, it is easy to find out that human, 
gorilla, and chimpanzee are most similar to each other, 
and the pairs like gorilla-chimpanzee (with weight devi- 
ation of 1.1266), human-gorilla (with weight deviation of 
4.3359), and human-chimpanzee (with weight deviation 
of 5.2500) are the most similar species pairs, but Gallus 
and opossum are the most dissimilar to the others (with 
weight deviation bigger than 11). It is consistent with 
the fact that Gallus is not a mammal, whereas the others 



Table 4 The complete coding sequences of P-globIn genes of 1 1 species 



Species Complete coding sequence 

Human ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTOCTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTOGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCOTG^ 

TCCmGGGGATCTGTCCACTCCTGATGCTGTOTGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCmAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCmGCCACACT 
GAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTOAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACmGGCAAAGAATOACCCCACCAGTGCAGGCTGCCTATCAGAAA 
GTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAA 

Chimpanzee ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGmCTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTOGTGGTGAGGCCCTGGGCAGGTOGTATCAAGGCTGCTGGTGGTCTACCCTOGACCCAG 
AGGTCmGAGTCCmGGGGATCTGTCCACTCCTGATGCTGTOTGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCmAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCAC 
CmGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAAC^CAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACmGGCAAAG 



Gorilla 



Black lemur 



Norway rat 



House 
mouse 



Goat 



Bovine 



Rabbit 



Opossum 



Gollus 



atggtgcacctgactcctgaggagaagtctgccgmctgccctgtggggcaaggtgaacgtggatgaagtogtggtgaggccctgggcaggctgctggtggtctaccctogacccagaggtomgagt 
ccmggggatctgtccactcctgatgctgtotgggcaaccctaaggtgaaggctcatggcaagaaagtgctcggtgccmagtgatggcctggctcacctggacaacctcaagggcaccmgccacactg 
agtgagctgcactgtgacaagctgcacgtggatcctgagaactoaagctcctgggcaatgtgctggtctgtgtgctggcccatcacmggcaaag 

atgacmgctgagtgctgaggagaatgctcatgtcacctctctgtggggcaaggtggatgtagagaaagtogtggcgaggccttgggcaggctgctggtcgtctacccatggacccagagg^ 
ccmggggacctgtcctctccttctgctgtotggggaaccctaaggtgaaggcccatggcaagaaggtgctgagtgccmagtgaaggtctgcatcacctggacaacctcaagggcaccmgctcaactg 
agtgagctgcactgtgacaagtocacgtggatcctcagaac^cactctcctgggcaacgtgctggtggtotgctggctgaacacmggcaatgcatoagcccggcggtgcaggctgccmcagaagg 
tggtggctggtgtggccaatgctctggctcacaagtaccactga 

atggtgcacctaactgatgctgagaaggctactg™gtggcctgtggggaaaggtgaatgctgataatgtogcgctgaggccctgggcaggctgctggtotctaccctogacccagaggtac^cta 
aamggggacctgtcctctgcctctgctatcatgggtaacccccaggtgaaggcccatggcaagaaggtgataaatgcctoaatgatggcctgaaacactogacaacctcaagggcaccmgctcatct 
gagtgaactccactgtgacaagctgcatgtggatcctgagaacttcaggctcctgggcaatatgatotgatotgtoggccaccacctgggcaaggaatoaccccctgtgcacaggctgcctocagaa 
ggtggtggctggagtggccagtgccctggctcacaagtaccactaa 

atggtgcacctgactgatgctgagaagtctgctgtctctocctgtgggcaaaggtgaaccccgatgaagtogtggtgaggccctgggcaggctgctggtotctaccctogacccagcggtacmgat^ 
cmggagacctatcctctgcctctgctatcatgggtaatcccaaggtgaaggcccatggcaaaaaggtgataactgccmaacgagggcctgaaaaacctggacaacctcaagggcaccmgccagcctc 
agtgagctccactgtgacaagctgcatgtggatcctgagaacttcaggctcctaggcaatgcgatcgtgatotgctgggccaccacctgggcaaggamcacccctgctgcacaggctgcctccagaagg 
tggtggctggagtggccactgccctggctcacaagtaccactaa 

atgctgactgctgaggagaaggctgccgtcaccggc^ctggggcaaggtgaaagtggatgaagtogtgctgaggccctgggcaggctgctggtotctacccctggactcagagg^cmgagcacm 
ggggacttgtcctctgctgatgctgtotgaacaatgctaaggtgaaggcccatggcaagaaggtgctagactccmagtaacggcatgaagcatctcacgacctcaagggcaccmgctcagctgagt 
gagctgcactgtgataagctgcacgtggatcctgagaactoaagctcctgggcaacgtgctggtggtotgctggctcgccaccatggcagtgaatcaccccgctgctgcaggctgagmcagaaggtg 
gtggctggtgtoccaatgccctggcccacagatatcactaa 

atgctgactgctgaggagaaggctgccgtcaccgcc^ggggcaaggtgaaagtggatgaagtogtggtgaggccctgggcaggctgctggtotctacccctggactcagaggtomgagtccmg 
gggactotccactgctgatgctgmtgaacaaccctaaggtgaaggcccatggcaagaaggtgctagatocmagtaatggcatgaagcatctcgatgacctcaagggcaccmgctgcgctgagtga 
gctgcactgtgataagctgcatgtggatcctgagaac^caagctcctgggcaacgtgctagtggtotgctggctcgcaa^ggcaaggaatoaccccggtgctgcaggctgacmcagaaggtggtg 
gctggtgtggccaatgccctggcccacagatatcatoa 

atggtgcatctgtccagtgaggagaagtctgcggtcactgccctgtggggcaaggtgaatgtggaagaagtogtggtgaggccctgggcaggctgctggtotctacccatggacccagaggtotoga 
tccmggggacctgtcctctgcaaatgctgmtgaacaatcctaaggtgaaggctcatggcaagaaggtgctggctgcctoagtgagggtctgagtcacctggacaacctcaaaggcaccmgctaagct 
gagtgaactgcactgtgacaagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggtottgtgctgtctcatca^ggcaaagaatcactcctcaggtgcaggctgcct^^ 
gtggtggctggtgtggccaatgccctggctcacaaataccactga 

atggtgcactoactctgaggagaagaactgcatcactaccatctggtctaaggtgcaggtoaccagactggtggtgaggccctogcaggatgctcgtotctacccctggaccaccaggi ii ii iggga 
gcmggtgatctgtcctctcctggcgctgtcatgtcaaa^ctaaggtcaagcccatggtgctaaggtgtoacctccttcggtgaagcagtcaagcamggacaacctgaagggtactotgccaagto 
agtgagctccactgtgacaagctgcatgtggaccctgagaactoaagatgctggggaatatcatotgatctgcctggctgagcacmggcaagga^actcctgaatgtcaggtoctogcagaa 
cgtggctggagtocccatgccctggcccacaagtaccactaa 

atggtgcactggactgctgaggagaagcagctcatcaccggcctctggggcaaggtcaatgtggccgaatgtggggccgaagccctggccaggctgctgatcgtctacccctggacccagaggtomg 
tccmgggaacctctccagccccactgccatcctogcaaccccatggtccgcgcccacggcaagaaagtgctcacctccmggggatgctgtgaagaacctggacaacatcaagaacaccttctcccaac 
tgtccgaactgcatotgacaagctgcatgtggaccccgagaactoaggctcctgggtgacatcctcatcatotcctggccgcccactoagcaaggactoactcctgaatgccaggctgcctggcagaa 
gctggtccgcgtggtggcccatgccctggctcgcaagtaccactaa 
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Chimpanzee 

25 I 




Figure 2 The 2D graphical representations of the complete coding sequences of |3-globin genes of chimpanzee. 



are mammals, and opossum is the most remote spe- 
cies from the remaining mammals. Similar results have 
been obtained in other papers by different approaches 
[2,5,7,9,33]. 

For testing the validity of our method, the existing 
results of the examination of the degree of similarity/ 
dissimilarity of the coding sequences of |3-globin genes of 
several species with the coding sequence of the human |3- 
globin gene by means of approaches using alternative 
DNA sequence descriptors [2,5,7,9] are listed in Table 6 
for comparison. 

From Table 6, we can find that the pairs like human- 
gorilla and human-chimpanzee are the two most similar 
species pairs when adopting (A) the method of our 
work, (B) the method of [2], (C) the method of [5], and 



(D) the method of [7], which is in accordance with the 
fact that gorilla and chimpanzee are the two most closest 
species of human, but when adopting (E) the method of 
[9], the most similar species pair is human-goat, which 
is obviously not correct. In addition, the pairs like 
hum^Ln-Gallus and human-opossum are the two most 
dissimilar species pairs when adopting (A) the method 
of our work, (C) the method of [5], and (E) the method 
of [9], which is in accordance with the fact that Gallus is 
not a mammal, whereas the others are mammals, and 
opossum is the most remote species from the remaining 
mammals. However, when adopting (D) the method of 
[7], the two most dissimilar species pairs are human- 
opossum and human-lemur, which is obviously not rea- 
sonable also. 
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Table 5 The similarity/dissimilarity matrix for the coding sequences of Table 1 based on the weight deviation 





nuiiiciii V.I III 1 i|^cii i^cc 


Gorilld 




Rat 


Mouss 


Goat 


Rovinp 

UV^VII IC 


Rahhit 


OnriQci iiYi 

vy|^v/33UI 1 1 




Human 


0 5.2500 


4.3359 


8.5891 


10.670 


9.7047 


8.2219 


8.1438 


7.8281 


15.6078 


16.7109 


Chimpanzee 


0 


1.1266 


8.0297 


1 0.645 


9.6016 


8.4375 


9.3219 


9.6000 


14.2578 


15.8734 


Gorilla 




0 


7.8688 


9.9625 


8.6063 


7.6734 


8.5578 


8.5547 


13.9719 


14.8781 


Lemur 






0 


8.7219 


9.5500 


7.1328 


9.3891 


5.6891 


12.9281 


15.2000 


Rat 








0 


6.0750 


7.0484 


9.3641 


9.6578 


13.5906 


14.1219 


Mouse 










0 


9.4953 


9.2641 


10.7984 


12.3406 


12.3688 


Goat 












0 


5.2625 


8.7219 


1 1 .9703 


14.5359 


Bovine 














0 


9.2906 


12.5922 


15.0234 


Rabbit 
















0 


14.8984 


15.6953 


Opossum 


















0 


14.2750 


Gallus 
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Figure 4 Phylogenetic tree based on the similarity matrix (Table 5). 
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Table 6 The similarity/dissimilarity of the coding 
sequences 



Species 


A 


B 


C 


D 


E 


Chimpanzee 


5.2500 


0.0144 


14.00 


0.005069 


0.863 


Gorilla 


4.3359 


0.0125 


13.63 


0.00661 1 


0.339 


Lemur 


8.5891 




31.75 


0.030894 


1.188 


Rat 


10.670 


0.1377 


41.65 


0.015539 


1.966 


Mouse 


9.7047 


0.1427 


30.27 


0.015700 


0.735 


Goat 


8.2219 


0.1161 


31.39 


0.020980 


0.311 


Bovine 


8.1438 


0.0773 


30.68 


0.017700 


2.489 


Rabbit 


7.8281 


0.1332 


35.575 


0.015788 


1.372 


Opossum 


15.6078 




48.701 


0.033363 


6.322 


Gallus 


16.7109 




70.46 


0.025801 


7.170 



4. Conclusion 

In this paper, we propose a new 2D graphical representa- 
tion for DNA sequences based on triplets, and associat- 
ing with a newly introduced concept of weight of triplets 
and a newly designed measure of similarity named weight 
deviation, we propose a new method to make similarity 
analysis of DNA sequences, in which no matrix computa- 
tion is needed and reasonable and useful approaches for 
both computational scientists and molecular biologists to 
effectively analyze DNA sequences can be provided at the 
same time. 
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